AITopics | episodic restless bandit problem

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing SystemsDec-25-2025, 04:41:17 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an $\tilde{\bigO}(\sqrt{T})$ Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy.

episodic restless bandit problem, regret bound, thompson sampling, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.66)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Reviews: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing SystemsJan-22-2025, 15:09:36 GMT

The restrictive assumption of restarting (which also significantly simplifies the regret analysis) was not mentioned. Note that the work by Liu, et.

assumption, episodic restless bandit problem, regret analysis, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Reviews: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing SystemsJan-22-2025, 15:09:25 GMT

The reviewers liked this paper, and I did as well. One thought is whether or not Exp4 can be adapted to this setting. The translation is not immediate by any means, but perhaps this is worth thinking about. Please take the reviewers suggestions into consideration for the final version as promised in your response.

episodic restless bandit problem, regret bound, thompson sampling

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Neural Information Processing SystemsOct-9-2024, 18:30:11 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an \tilde{\bigO}(\sqrt{T}) Bayesian regret bound.

episodic restless bandit problem, regret bound, thompson sampling

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Jung, Young Hun, Tewari, Ambuj

Neural Information Processing SystemsMar-19-2020, 00:16:29 GMT

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an $\tilde{\bigO}(\sqrt{T})$ Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy.

episodic restless bandit problem, regret bound, thompson sampling

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback

Filters

Collaborating Authors

episodic restless bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Reviews: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Reviews: Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems